Resampling methods

ABSTRACT

Polyphase filtering, such as resampling for image resizing, on a processor with parallel output units is cast in terms of data access blocks and data coverage charts to increase processor efficiency. Automatic generation of implementations corresponding to input resampling factors by computation cost comparisons.

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority from provisional application No.60/420,319, flied Oct. 22, 2002.

BACKGROUND OF THE INVENTION

[0002] The present invention relates to digital signal processing, andmore particularly to resampling to adjust sampling rates.

[0003] Various consumer products use digital signals, such as music onCDs, images in digital cameras, and video on DVDs, and multiple samplingrates have been used create the digital files. The playout/displaydevice for such a digital file may require a sampling rate differingfrom that of the digital file to be played, and thus resampling toadjust the sampling rate is needed. For example, music may be sampled at16 KHz, 44.1 KHz, or 48 KHz, and images at 1600×1200 pixels or 640×480pixels. The resampling factor is the ratio of the new sampling ratedivided by the original sampling rate.

[0004] It is generally easier to implement resampling when theresampling factor is either an integer (upsampling) or the reciprocal ofan integer (downsampling). Fractional resampling (resampling factor isU/D where U and D are integers greater than 1) is more complicated toimplement but frequently required in real applications. For example, thedigital zoom feature of camcorders and digital cameras often involves aseries of finely-spaced zoom factors such as 1.1×, 1.2×, 1.3×, and soon.

[0005] Crochiere et al, Multirate Digital Signal Processing(Prentice-Hall 1983) includes resampling theory and structures. Inparticular, FIG. 2a shows generic resampling (a rate converter) whichfirst expands the sampling rate by a factor of U, lowpass filters toeliminate aliasing, and then compresses the sampling rate by a factor ofD. The sampling rate expansion is just inserting 0s, and the samplingrate compression is just discarding samples. The lowpass filter leads tocomputational complexity, and a polyphase filter implementation asillustrated in FIG. 2b helps avoid unnecessary multiplications andadditions. However, such a polyphase filter implementation inherentlyrequires irregular data access in the sense that input/output addressinginvolves fractional arithmetic.

[0006] Generally, single-thread, VLIW (very long instruction word), SIMD(single instruction, multiple dispatch), and vector DSP processorarchitectures have a high level of efficiency for multiply-accumulate(MAC) operations with regular data access in the sense of simple,well-behaved, multi-dimensional addressing. In a conventionalsingle-thread DSP, simple and regular data access is sometimes free butotherwise requires little computation time. In a VLIW DSP, simple andregular data access can execute simultaneously with MAC instructions,and thus is often free. A SIMD DSP often requires that the data beorganized sequentially to align with the wide memory/register word, sosimple and regular access is mandatory in order to take advantage of theSIMD features. A vector DSP usually has hardware address generation andloop control, and these hardware resources cannot deal with anything butsimple and regular addressing. Straightforward implementation offractional resampling on various digital signal processor architecturesis thus fairly inefficient.

[0007] Thus there is a problem to adapt polyphase filter resamplingmethods for efficient operation on DSPs.

SUMMARY OF THE INVENTION

[0008] The present invention provides regular data addressing forpolyphase filtering of resampling by

BRIEF DESCRIPTION OF THE DRAWINGS

[0009] The drawings are heuristic for clarity.

[0010]FIG. 1a is a flow diagram.

[0011]FIG. 1b shows a digital camera.

[0012]FIGS. 2a-2 c illustrate resampling.

[0013]FIGS. 3a-3 c are graphs of upsampling polyphase filters andcorresponding data access block.

[0014]FIGS. 4a-4 b are graphs of downsampling polyphase filters andcorresponding data access block.

[0015]FIG. 5 lists example implementations.

[0016]FIGS. 6a-6 b illustrate architecture kernels.

[0017]FIGS. 7a-7 c show an example data access block and two accesscoverage charts for differing parallel outputs.

[0018]FIGS. 8a-8 d show an example access coverage chart and horizontalplus vertical filtering implementations.

[0019]FIG. 9 graphs the sinc, window, and windowed sinc functions.

[0020]FIG. 10 are graphs of sub-filters of windowed sinc.

[0021]FIG. 11 shows offset of windowed sinc.

[0022]FIG. 12 lists data access patterns and access coverage charts.

[0023]FIG. 13 lists parameters of an example.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0024] 1. Overview

[0025] The preferred embodiment methods of resampling using a processorwith multiple parallel execution units (e.g., multiply-accumulate units)include run-time implementation analysis in response to an inputresampling ratio as illustrated in FIG. 1a. Resampling of images useshorizontal and vertical filtering passes with filter coefficientsderived from a windowed sinc filter. The methods implement a polyphasesub-filter structure and pick execution unit architecture andimplementation parameters to minimize computation cost (e.g., sub-filterlength).

[0026] The preferred embodiment methods apply to a variety of platforms,including conventional single-thread, VLIW (very long instruction word),SIMD (single instruction, multiple dispatch), and vector DSP processorarchitectures. For example, FIG. 1b shows a digital camera with an imageaccelerator (IMX) which includes multiple (e.g., 4 or 8) parallel MACunits. Zoom selection input drives optical zoom (if any) and/orelectronic resampling zoom that invokes stored and/or computedresampling filterings which apply the IMX parallel processingcapabilities to captured images.

[0027] 2. Upsampling, Downsampling, and Fractional Resampling

[0028] First consider the polyphase filter structure of FIG. 2b for theupsampling (sampling rate expander plus lowpass filtering) by anintegral factor of U in FIG. 2a. Let x(n) be the input stream ofsamples; then the insertion of U-1 0s between successive samples x(n)yields the expanded-sampling-rate sample sequence u(k): $\begin{matrix}{{u(k)} = \begin{matrix}{x(n)} & {\quad {{{if}\quad k} = {n\quad U}}}\end{matrix}} \\{= \begin{matrix}{0\quad} & {{if}\quad k\quad {is}\quad {not}\quad a\quad {multiple}\quad {of}\quad U}\end{matrix}}\end{matrix}$

[0029] The anti-aliasing lowpass filter h(k) can thus have a bandpass ofat most 1/U to avoid aliasing. In general, an FIR fiter is preferredover an IIR filter: FIR filters have linear phase and are easier toimplement on most DSP platforms. Presume that the lowpass filter is anFIR filter with length L kernel h(k), then the upsampled output y(k) isgiven by the convolution:

y(k)=Σ_(0≦j≦L−1) u(k−j)h(j)

[0030] The lowpass filter in practice typically is a windowed version ofthe standard sinc lowpass filter kernel and thus symmetric, so fornotational convenience replace j with −j in the sum so the convolutionlooks formally like an inner product. (When the lowpass filter kernel isnot symmetric, then reverse order the coefficients accomplishes the sameeffect.)

[0031] The length, L, of the filter kernel is a tradeoff betweenanti-aliasing performance and computational complexity. In general, thelonger the filter kernel, the better the output quality, at the expenseof more computation and longer latency.

[0032] Further, in the sampling rate expanded sequence, x′(k), most ofthe samples (at least U−1 out of every U) equal 0, so the filteringcomputation has many multiplications by 0. A polyphase filterimplementation avoids these superfluous multiplications by splitting thefilter kernel into U phases, and cycling through the phases. Inparticular, define the U sub-filter kernels by downsampling the filterkernel by a factor of U for each:

H ₀(k)=h(Uk)

H ₁(k)=h(Uk+1)

H ₂(k)=h(Uk+2) . . .

H _(U−1)(k)=h(Uk+U−1)

[0033] Thus the filtering of u(k) with h(k) can be rewritten:$\begin{matrix}{{y(k)} = {\sum\limits_{0 \leq j \leq {L - 1}}{{u\left( {k + j} \right)}{h(j)}}}} \\{= {\sum\limits_{0 \leq i \leq {{({L - 1})}/U}}{{x\left( {m + i} \right)}{H_{n}(i)}\quad {when}}}} \\{k = {{{mU} + {n\quad {for}\quad 0}} \leq n \leq {U - 1}}}\end{matrix}$

[0034]FIG. 2b illustrates this periodic cycling of the sub-filterkernels with each sub-filter kernel only of length at most L/U. Theoriginal sample sequence, x(n), is sent simultaneously to all of these Usub-filters, but the filters operate at the input rate. The upsamplingby a factor of U comes from the sequential clocking out of the outputsof the U sub-filters, one into each output time slot. Each output samplerequires one sub-filtering job, involving about L/U multiplications.This represents a reduction by a factor of U in computation complexity.

[0035] For downsampling by an integer factor of D, the preliminarylowpass filtering must reduce the bandwidth by a factor of 1/D and thenthe downsampling retains only every Dth filter output samples anddiscards the rest. The center plus righthand portions of FIG. 2a showthe lowpass filtering by h(k) and downsampling by D. Generally, for aninput sequence u(n) the lowpass filtering with h(k) is:

w(k)=Σ_(0≦j≦L−1) u(k+j)h(j)

[0036] This is the same as previously described but not requiring thespecial form of u(n) as an upsampled input sequence. Then thedownsampling is:

y(n)=w(nD)

[0037] Again, there is inefficiency of straightforward implementation:computing all of the filterings w(k) is unnecessary because D−1 out ofevery D is discarded. Indeed, $\begin{matrix}{{y(n)} = {w({nD})}} \\{= {\sum\limits_{0 \leq j \leq {L - 1}}{{u\left( {{nD} + j} \right)}{h(j)}}}}\end{matrix}$

[0038] so the input samples are shifted by D samples for each filtering.

[0039]FIG. 2a shows an overall resampling by a factor of U/D (U and Dmay be taken as relatively prime integers greater than 1); the lowpassfilter reduces bandwidth by a factor of max(U,D) to avoid aliasing.Again, a straightforward implementation has computationalinefficiencies. However, combining the foregoing upsampling anddownsampling leads to the implementation of FIG. 2c; an upsampling bypolyphase filtering and downsampling by control of the output selectionswitch to the sub-filters: output y(n) comes from sub-filter −nD mod[U].This means output y(n) will pick up one in every D samples from the bankof U sub-filters. Input data access can be obtained by sliding thefilter envelope according to output phase, as in the integer-factorupsampling polyphase implementation.

[0040]FIG. 4a illustrate a few outputs from the polyphase filter for theexample of U=5, D=2, with an 11-tap lowpass filter. In particular, thefirst row of FIG. 4a shows the eleven filter coefficients h(j); thesecond row shows the input samples x(k) at a spacing of 5 samples due toU=5; the third row shows output samples y(n) with a spacing of 2 samplesdue to D=2; the fourth panel shows sub-filter H₀ with coefficients h(0),h(5), h(10); the fifth row shows the sub-filter H₃ with coefficientsh(3), h(8); the sixth row shows the sub-filter H₁ with coefficientsh(1), h(6); the seventh row shows sub-filter H₄ with coefficients h(4),h(9); and the eighth row shows sub-filter H₂ with coefficients h(2),h(7). The computations are: $\begin{matrix}{{y(0)} = {w_{0}(0)}} \\{= {{{x(0)}{H_{0}(0)}} + {{x(1)}{H_{0}(1)}} + {{x(2)}{H_{0}(2)}}}} \\{= {{{x(0)}{h(0)}} + {{x(1)}{h(5)}} + {{x(2)}{h(10)}}}} \\{{y(1)} = {{w_{3}(1)}\quad \left( {3 \equiv {{- (1)}(2){{mod}\lbrack 5\rbrack}}} \right)}} \\{= {{{x(1)}{H_{3}(0)}} + {{x(2)}{H_{3}(1)}}}} \\{= {{{x(1)}{h(3)}} + {{x(2)}{h(8)}}}} \\{{y(2)} = {{w_{1}(1)}\quad \left( {1 \equiv {{- (2)}(2){{mod}\lbrack 5\rbrack}}} \right)}} \\{= {{{x(1)}{H_{1}(0)}} + {{x(2)}{H_{1}(1)}}}} \\{= {{{x(1)}{h(1)}} + {{x(2)}{h(6)}}}} \\{{y(3)} = {{w_{4}(2)}\quad \left( {4 \equiv {{- (3)}(2){{mod}\lbrack 5\rbrack}}} \right)}} \\{= {{{x(2)}{H_{4}(0)}} + {{x(3)}{H_{4}(1)}}}} \\{= {{{x(2)}{h(3)}} + {{x(3)}{h(8)}}}} \\{{y(4)} = {{w_{2}(2)}\quad \left( {2 \equiv {{- (4)}(2){{mod}\lbrack 5\rbrack}}} \right)}} \\{= {{{x(2)}{H_{2}(0)}} + {{x(3)}{H_{2}(1)}}}} \\{= {{{x(2)}{h(1)}} + {{x(3)}{h(6)}}}} \\{{y(5)} = {{w_{0}(2)}\quad \left( {0 \equiv {{- (5)}(2){{mod}\lbrack 5\rbrack}}} \right)}} \\{= {{{x(2)}{H_{0}(0)}} + {{x(3)}{H_{0}(1)}} + {{x(4)}{H_{0}(2)}}}} \\{= {{{x(2)}{h(0)}} + {{x(3)}{h(5)}} + {{x(4)}{h(10)}}}} \\{{y(6)} = {{w_{3}(3)}\quad \left( {3 \equiv {{- (6)}(2){{mod}\lbrack 5\rbrack}}} \right)}} \\{= {{{x(3)}{H_{1}(0)}} + {{x(4)}{H_{1}(1)}}}} \\{= {{{x(3)}{h(1)}} + {{x(4)}{h(6)}}}}\end{matrix}$

[0041] and so on where w_(m)(n) is the nth output sample of the mthsub-filter with filter kernel H_(m). Generally,y(n)=W_(−nD mod[u])(floor[(nD+g)/U]) where g is a fixed offset whichequals 4 in this example.

[0042] 3. Data Access Blocks and Architecture Kernels

[0043] A data access pattern diagram can illustrate the polyphasefiltering. The data access pattern is a two-dimensional plot of dotsrepresenting the polyphase filtering with the input sample index runninghorizontally from left to right and the filtered output sample indexrunning vertically from top down: dots on a row denote data pointscontributing to the output corresponding to that row and sub-filter. Thepattern repeats, so a finite plot suffices. Indeed, for the general caseof resampling by a factor of U/D, the pattern repeats every U outputsfor every horizontal increment of D inputs. Thus a K/U×U data accessblock plus indication of the D increment, such as by an empty block, forrepeat shows the data access.

[0044]FIGS. 3a-3 b give an example with an 11-tap FIR filter andupsampling by a factor of U=3. The first row of FIG. 3a shows the filterkernel h(i) with h(0) at the left end and h(5) the maximum value in themiddle and h(10) at the right end; the second row shows x(n) valuesseparated by two 0s for the upsampling; the third row shows the H₀sub-filter with coefficients h(0), h(3), h(6), h(9) from the first rowh(i) and aligned with the x(0), x(1), x(2), x(3) values to computey(0)=x(0)h(0)+x(1)h(3)+x(2)h(6)+x(3)h(9). The fourth row shows thecoefficients of sub-filter H₂ coefficients as h(2), h(5), h(8) andoffset 1 to align with the x(n) for computationy(2)=x(1)h(2)+x(2)h(5)+x(3)h(8); and the fifth row shows thecoefficients of sub-filter H₁ coefficients as h(1), h(4), h(7), h(10)and offset 2 to align with the x(n) for computationy(2)=x(1)h(1)+x(2)h(4)+x(3)h(7)+x(4)h(10).

[0045]FIG. 3b shows the data access block for the example of FIG. 3awith no downsampling (D=1) and the 5×3 block repeats for output y(3),y(4), y(5), as indicated by the empty 5×3 block.

[0046] Generally for upsampling by an integer yields a rectangular dataaccess block with some missing spots due to head and tail of some of theoutput phases that happen to fall outside of the kernel envelope andthus become zeros. The height of the data access block is U and thewidth is the smallest integer at least as large as K/U where K is thelength of the original filter kernel in output sample scale. For theexample of FIGS. 3a-3 b, K=11 and the width is 4.

[0047]FIG. 3c shows the data access block for a downsampling by a factorof D=4 following a 7-tap lowpass filter. Downsampling by an integerfactor of D generally has a horizontal 1-dimensional array in the dataaccess block because the height is U=1. The width of the block is Kwhere K is the length of the original filter in input sample scale, andthe increment to the next data access block is D. This method takes Kmultiplications per output, which is 1/D times the rate of thestraightforward implementation.

[0048]FIG. 4b illustrates the access data block for the FIG. 4aresampling example. Sub-filter numbers are noted in parentheses forconvenience in ordering the sub-filters in the bank for easierimplementation. The height of the data access block is U=5, and thehorizontal increment for the next iteration is D=2. The dots form agenerally diagonal band running from upper left to lower right and witha slope of −U/D. The rows have varying widths, but generally the widthis roughly K/U; therefore the width of the data access block roughlyequals D+K/U.

[0049] The data access pattern provides a good visual cue of thecomputation:

[0050] (i) U and D can be observed as the height of the data accessblock and the iteration increment. When the data access block is wideand tall (in units of dots), the resampling is accurate. In contrast,then the data access block is small, the resampling is coarse. Muchwider than tall blocks means large-factor downsampling, and much tallerthan wide blocks means large-factor upsampling.

[0051] (ii) The number of dots represents the minimal number of MACoperations per U outputs, as well as the storage requirement for thefilter coefficients.

[0052] (iii) Overlap of one row to the next row represents the coverageof input points in the resampling process. When there is no or littleoverlap, the quality of the resampling may be questionable. When thereis much overlap, except for the case of large-factor-upsampling, thefilter may be longer than necessary.

[0053] Integer factor upsampling and integer-factor downsampling are nottoo difficult to implement. Once a filter kernel is designed, theupsampling or downsampling can be derived from a simple FIRimplementation of the filter: the downsampling case by shifting inputsby D, and the upsampling case by splitting the filter into U sub-filtersand cycling among them.

[0054] Fractional resampling is more difficult. With the original filtercoefficients, splitting them into U phases is not a problem. Use the −nDmod[U] expression to compute the order of use of the phases, but thepattern repeats for each group of U outputs, so simply reorder thesub-filters in advance so the output cycles through them sequentially.

[0055] Fractional resampling has the challenge of stepping through theinputs. Input access points from one output to the next vary, andcomputing the access points on the fly requires division (except when Uis a power of 2, the division becomes a shift). Such division should beavoided carrying out such computation in run time if at all possible.Even is the accesses form a group of U outputs is hard-coded, theirregular data access makes parallel and pipelined processing difficult.

[0056] The preferred embodiment methods have an overall strategy todetermine what kind of data access in filtering a target processorarchitecture can do efficiently, and then rearrange the polyphasefiltering for resampling in such a way to achieve maximal efficiency onthe target processor. Each of the architectures, and often each specificdevice within the architecture group, has its own characteristics,constraints, and cost function on addressing regular data pattern and onperforming filtering operations. Thus the preferred embodimentsintroduce a notation to categorize efficient usage model of theprocessors, to allow for analysis of efficient implementations.

[0057] In particular, each target architecture has its natural ways ofstepping through data points and applying MAC operations on them torealize filtering. Single-thread DSPs usually have a lot freedom.Parallel DSPs usually have certain constraints. Each basic pattern iscalled an Architecture Multiply-Accumulate Kernel, or architecturekernel for short.

[0058] The notation is similar to the data access pattern and dataaccess block in the foregoing. The data point index again is horizontal,and output points are again vertical, which in hardware means multipleaccumulators or registers. Note that an architecture kernel does notnecessarily mean what the DSP can do in a single cycle. The arch kernelis what the DSP can conveniently do from control point of view. It canbe a single cycle, several cycles inside the inner-most loop, or whatthe inner-most loop can do.

[0059] There are many possible architecture kernels; FIG. 5 lists a fewexamples with explanations. Typically, a parallel DSP has a few feasiblearchitecture kernels, and they can be picked according to the dta accesspattern. Often a single-thread DSP has the single data point as thebuilding block, and can mmplement nay regular-shaped data accesspattern. Due to the cost of looping and addressing, the simpler accesspattern often leads to higher efficiecy in the implementation.

[0060] As another example, the image accelerator of the DM310 from TexasInstruments Inc. has 8 MAC units and 6-level looping; and the writealignment corresponds to the number of outputs: 8-word, 4-word, 2-word,or 1-word. FIG. 6a shows the various architecture kernels of theaccelerator with 8 outputs, and FIG. 6b shows the 4 output architecturekernels for a simpler accelerator with 4 MAC units and 4-level loopingwith any-word write alignment.

[0061] A conventional DSP processor, either single-MAC-single-thread ormultiple-MAC-VLIW, usually can implement many possible architecturekernels, with varying costs. Making efficient use of such architecturesfor fracitonal resampling involves tight assembly coding of possiblearchitecture kernels and tabulating the cycle counts. Normally,addressing and looping will take some overhead. The preferred embodimentstrategy is thus to use regular shapes to reduce the number of looplevels.

[0062] Some DSPs have zero-overhead looping for one or two levels. Wththis feature, such DSPs possess one or more parameterized architecturekernels. For example, the C54xxx DSP has the instructionMAC*AR2+,*AR3+,A can be placed inside of a single-cycle inner-most loopwithout any overhead. This implements an N-wide kernel, N beingprogrammable.

[0063] As a simple extension to the above kernel, the two MACinstructions

[0064] MAC*AR2,*AR3+,A

[0065] MAC*AR2+,*AR3+,B can be put inside a loop-block that iszero-overhead in the C54xxx DSP. This implements a 2-row N-widearchitecture kernel.

[0066] Efficient architecture kernels on conventional DSPs are suaullyregular-shaped. In addition, most resampling problems cannot beimplemented with just one or two levels of “free” loops. At outer loopsthe use of DSP instruction for address adjustment and looping areneeded. Most important aspects in parallel/vector DSP implementations,of keeping the data access pattern simple and regular, also apply toconventional DSPs. Consequently, the preferred embodiment methodsapplied to the accelerator in the following can be extended toresampling on conventional DSPs as well.

[0067] 4. Two-Dimensional Image Resampling Implementations

[0068] An implementation of a resampling filter on a processor amountsto covering the data access block dots of the filter with circles of thearchitecture kernel of the processor. And the most efficientimplementation is the one with the fewest circles of the architecturekernel not covering data access block dots. The number of dot coveringcombinations of is finite, so a search can find the most efficientimplementation.

[0069] An example will illustrate the method for resampling (resizing) adigital image by a first pass over the image with a horizontal filteringusing a one-dimensional filter followed by a second pass over the imagewith a vertical filtering using a second one-dimensional filter (whichmay be the same filter as used in the first pass). The first passfiltering resizes the image horizontally, and then the second passresizes the horizontally-resized image vertically. Presume that theimage samples are stored row-wise; that is, in memory adjacent samplescorrespond to horizontally adjacent image locations except at the endsof rows.

[0070] Consider a processor with the three architecture kernels shown inFIG. 6b and four looping levels; the image accelerator of the DSC25 fromTexas Instruments Inc. includes such architecture kernels and providesfour looping levels. And take as the one-dimensional resampling filter a19-tap filter with resampling ratio of 5/8. More explicitly, 5/8resampling with 19-tap, h(0), . . . ,h(18), has the followingcomputations.

[0071] First, take x(0), x(1), x(2), . . . as the input sequence;

[0072] Second, upsampling by 5 yields:

u(0)=x(0), u(1)=0, u(2)=0, u(3)=0, u(4)=0, u(5)=x(1), u(6)=0,

[0073] Third, lowpass filter with h( ), express as inner product formatwhich correspond to h( ) with reversed-order coefficients andasymmetrical:

w(0)=u(0)h(0)+u(1)h(1)+u(2)h(2)+ . . . +u(18)h(18)

w(1)=u(1)h(0)+u(2)h(1)+u(3)h(2)+ . . . +u(19)h(18)

w(2)=u(2)h(0)+u(3)h(1)+u(4)h(2)+ . . . +u(20)h(18) . . .

w(8n)=u(8n)h(0)+u(8n+1)h(1)+u(8n+2)h(2)+ . . . +u(8n+18)h(18) . . .

[0074] Fourth, downsampling by 8 gives:

y(0)=w(0)

y(1)=w(8)

y(2)=w(16)

y(k)=w(8 k)

[0075] Thus combining the foregoing: $\begin{matrix}{{y(0)} = {{{u(0)}{h(0)}} + {{u(1)}{h(1)}} + {{u(2)}{h(2)}} + \ldots + {{u(18)}{h(18)}}}} \\{= {{{x(0)}{h(0)}} + {{x(1)}{h(5)}} + {{x(2)}{h(10)}} + {{x(3)}{h(15)}}}} \\{= {{inner}\quad {product}\quad {{of}\quad\left\lbrack {{x(0)},{x(1)},{x(2)},{x(3)}} \right\rbrack}\quad {with}\quad {H0}}} \\{{y(1)} = {{{u(8)}{h(0)}} + {{u(9)}{h(1)}} + {{u(10)}{h(2)}} + \ldots + {{u(26)}{h(18)}}}} \\{= {{{x(2)}{h(2)}} + {{x(3)}{h(7)}} + {{x(4)}{h(12)}} + {{x(5)}{h(17)}}}} \\{= {{inner}\quad {product}\quad {{of}\quad\left\lbrack {{x(2)},{x(3)},{x(4)},{x(5)}} \right\rbrack}\quad {with}\quad {H2}}} \\{{y(2)} = {{{u(16)}{h(0)}} + {{u(17)}{h(1)}} + {{u(18)}{h(2)}} + \ldots + {{u(34)}{h(18)}}}} \\{= {{{x(4)}{h(4)}} + {{x(5)}{h(9)}} + {{x(6)}{h(14)}}}} \\{= {{inner}\quad {product}\quad {{of}\quad\left\lbrack {{x(4)},{x(5)},{x(6)},{x(7)}} \right\rbrack}\quad {with}\quad {H4}}} \\{{y(3)} = {{{u(24)}{h(0)}} + {{u(25)}{h(1)}} + {{u(26)}{h(2)}} + \ldots + {{u(42)}{h(18)}}}} \\{= {{{x(5)}{h(1)}} + {{x(6)}{h(6)}} + {{x(7)}{h(11)}} + {{x(8)}{h(16)}}}} \\{= {{inner}\quad {product}\quad {{of}\quad\left\lbrack {{x(5)},{x(6)},{x(7)},{x(8)}} \right\rbrack}\quad {with}\quad {H1}}} \\{{y(4)} = {{{u(32)}{h(0)}} + {{u(33)}{h(1)}} + {{u(34)}{h(2)}} + \ldots + {{u(50)}{h(18)}}}} \\{= {{{x(7)}{h(3)}} + {{x(8)}{h(8)}} + {{x(9)}{h(13)}} + {{x(10)}{h(18)}}}} \\{= {{inner}\quad {product}\quad {{of}\quad\left\lbrack {{x(7)},{x(8)},{x(9)},{x(10)}} \right\rbrack}\quad {with}\quad {H3}}} \\{{y(5)} = {{{u(40)}{h(0)}} + {{u(41)}{h(1)}} + {{u(42)}{h(2)}} + \ldots + {{u(58)}{h(18)}}}} \\{= {{{x(8)}{h(0)}} + {{x(9)}{h(5)}} + {{x(10)}{h(10)}} + {{x(11)}{h(15)}}}} \\{= {{inner}\quad {product}\quad {{of}\quad\left\lbrack {{x(8)},{x(9)},{x(10)},{x(11)}} \right\rbrack}\quad {with}\quad {H0}}} \\{{= {a\quad {repeat}\quad {of}\quad {y(0)}\quad {but}\quad {with}\quad {the}\quad {x(n)}\quad {input}\quad {offset}\quad 8}};} \\{{{an}\quad {indication}\quad {of}\quad {the}\quad {5/8}\quad {resampling}\quad {{ratio}.}}} \\{{y(6)} = {{{u(48)}{h(0)}} + {{u(49)}{h(1)}} + {{u(50)}{h(2)}} + \ldots + {{u(66)}{h(18)}}}} \\{= {{{x(10)}{h(4)}} + {{x(9)}{h(7)}} + {{x(14)}{h(12)}}}} \\{= {{inner}\quad {product}\quad {{of}\quad\left\lbrack {{x(10)},{x(11)},{x(12)},{x(13)}} \right\rbrack}\quad {with}\quad {H4}}} \\{{= {a\quad {repeat}\quad {of}\quad {y(1)}\quad {but}\quad {with}\quad {the}\quad {x(n)}\quad {input}\quad {offset}\quad 8}};} \\{{{again},{{showing}\quad {the}\quad {5/8}\quad {resampling}\quad {{ratio}.}}}}\end{matrix}$

[0076] resampling ratio.

[0077] Thus generally, for k in the range 0 to 4: y(5n+k)=inner productof the 4-vectors [x(8n+m),x(8n+m+1), x(8n+m+2),x(8n+m+3)] and Hj where mis the integer part of (8k+4)/5 and where j is in the range 0 to 4 andj=−8k mod[5]. (H4 may be extended to have 4 coefficients by takingH4=[h(4), h(9), h(14), 0].)

[0078] Note that for 5/8 resampling, the order of the five sub-filtersis H0, H2, H4, H1, and H3. Whereas, the analogous computations for 5/7resampling yields the order of the sub-filters as H0, H3, H1, H4, andH2, and for 5/6 resampling the order becomes H0, H4, H3, H2, and H1.

[0079]FIG. 7a shows the data access block for the 5/8 resampling filterwith the height U=5, next block offset D=8, and overall filter lengthK=19 so that the length of each row of dots is K/U˜4. The 5/8 resamplingfilter will first be applied to the digital image row-wise to convert anN×M image to a 5N/8×M image; and then the 5/8 resampling filter will beapplied column-wise to this horizontally-resized image to yield thefinal 5N/8×5M/8 resized image.

[0080] With a processor having four levels of loops, use the inner-most(fourth) level to accumulate over the filter coefficients of eachsub-filter; that is, for the dots in a row of the access coverage chart.Each of the three architecture kernels has only one circle per row (thatis, the processor does one MAC for each of the four inputs that arebeing processed in parallel), and so the inner-most loop needs at leastceiling[K/U] iterations; sometimes somewhat more due to the differencebetween U/D and the slope of the kernel pattern. For example, the “1:1slope” kernel with the above 5/8 resampling 19-tap filter (FIG. 7a dataaccess block), the parallel computation of 4 outputs, y(5n), y(5n+1),y(5n+2), y(5n+3), takes at least 6 iterations; namely, first iterationuses parallel inputs x(8n)−x(8n+3), second iteration parallel inputsx(8n+1)-x(8n+4), ., sixth iteration parallel inputs x(8n+5)-x(8n+8).FIG. 7b illustrates the 6 iterations as circles; circles without dotsare MACs with a 0 coefficient for the sub-filter; that is, thesub-filters are extended to 6-vectors by inserting 0s.

[0081] The second inner-most (third level) loop is used to generate theU outputs of the data access block; for the 5/8 example U=5 and thistakes 2 iterations because only 4 outputs are provided by the inner-most(fourth level) loop. Generally, ceiling[U/H] iterations for a kernelwith H outputs. Explicitly for the 5/8 19-tap filter example, the firstiteration of the third level loop has the fourth level loop computingy(5n), y(5n+1), y(5n+2), y(5n+3) and writing these 4 outputs to memory;then the second iteration increments the input by 7 from the firstiteration starting point and replaces the sub-filters H0, H2, H4, H1with the sub-filter H3 and three 0s, and then again executes the fourthlevel loop to compute 4 outputs: y(5n+4) and three zeros. These outputsare written to the next 4 memory locations, but the memory pointer willbe decremented by 3 so the 3 zero outputs will be discarded byoverwriting in the next calling of the third level loop. FIG. 7billustrates this with the second iteration of the third level looptaking the input data starting point at the same data point as the endof the first iteration, x(8n+5), but with a 0 coefficient for the H3sub-filter; see the initial empty circle on the fifth row.Alternatively, the second iteration could increment the input startingpoint by 1 or 2 from the end of the first iteration, x(8n+6) or x(8n+7).This would just shift the circles in the fifth row and change which thelocation of the two 0 coefficients added to H3.

[0082] The second and first loop levels are used for stepping thefiltering along an image row and stepping through the rows of the image,respectively. In particular, for the 5/8 19-tap filter example, aniteration in the second level loop increments the input pointer by 8,and executes the third level loop which computes and writes to memorythe next 5 outputs. Thus iterating the second level loop N/8 timescorresponds to resampling a row of length N to yield a single row oflength 5N/8. Then the outer-most (first level) loop steps through therows of the image; the overall is a horizontal resizing by 5/8 with novertical change.

[0083]FIG. 8b shows the computations available from the three possiblearchitecture kernels in the case of the inner-most loop having 5iterations; and FIG. 8c shows the access coverage of the data accessblock of FIG. 8a by the computation pattern on the left in FIG. 8b(corresponding to the kernel denoted “1:1 slope” in FIG. 6b). The dotsof the data access block more closely match the pattern of the “1:1slope” architecture kernel because the slope of the band of dots isroughly 1 for the horizontal variable increasing to the right and thevertical variable increasing downwards. Indeed, the architecture kernelwhich best aligns with the U/D angle of the data access block shouldlikely lead to the best coverage. Thus with the architecture kernels ofFIG. 6b, when U/D is much less than 1, use the “1:1 slope”, when U/Droughly equals 2, use the “2:1 slope” kernel, and when U/D is muchgreater than 2, use the “4 tall” kernel. Of course, the optimum kernelis found by searching over the three possibilities. The access coveragechart, FIG. 8c, for the 5/8 resampling 19-tap filter shows animplementation in which the sub-filters (rows) are 5-tap type FIRfilters. The differences for general 5-tap filters are that (a) this isa 5-phase filter (U=5), after computing 8 outputs, shift the inputpointer by 8 samples, and (c) after writing out 8 outputs, roll back theoutput pointer by 3 which discards the last 3 outputs by subsequentoverwriting. The access coverage chart denotes the discard by the 3strikeout rows.

[0084] In an access coverage chart the circles that cover dots representmeaningful computations, and the circles that are empty represent wastedcomputation (multiplying 0s or producing outputs that will bediscarded). The efficiency of an implementation can be clearly observedas the ratio of the number of circled dots divided by the total numberof circles. In the example of FIG. 8c, the efficiency is 19/40=47.5%.Simiarly, for the 5/8 resampling example, FIG. 7b shows the accesscoverage chart if 4 parallel outputs were available as in FIGS. 8a-8 c;and as a contrast, FIG. 7c shows the access coverage chart if 5 paralleloutputs had been available. Note that with 4 parallel outputs thesub-filter length is 6, but a second iteration of the third level loopis needed and three 0 sub-filters are used, so the efficiency is19/48=39.6%. In contrast, with 5 parallel outputs available the filterlength would be 7, but only a single third level loop iteration isneeded, and the efficiency is 19/35=54.3%.

[0085] During the second pass of one-dimensional vertical filtering,columns of the horizontally-resampled image are processed in parallel.Each one-dimensional filtering problem is really use of thesingle-circle architecture kernel. Address generation allowsimplementation of any regular shape coverage. Indeed, with four levelsof loops, use the inner-most (fourth) level to cover the filtercoefficients of each sub-filter; that is, for the dots in a row of theaccess coverage chart which correspond to an image column. Moreexplicitly for the 5/8 19-tap example, let w(j,k) denote the5/8-horizontally-resized image from the first pass filtering; then thesingle output y(j,5n) is the inner product of H0 and the 4-vector[w(j,8n), w(j,8n+1), w(j,8n+2), w(j,8n+3)]. So the row of dots in theaccess coverage chart represent these inputs which are stored atsuccessive addresses (for resized rows of length 5N/8) j+8n5N/8,j+(8n+1)5N/8, j+(8n+2)5N/8, j+(8n+3)5N/8; that is, the input addressgenerator increments by 5N/8, which is the number of samples a row ofthe horizontally-resized image. And the 4 MAC units could be outputtingy(j,5n), y(j+1,5n), y(j+2,5n), y(j+3,5n) in parallel; that is, each MACunits computes the inner product of H0 with the 4-vector starting atw(j,8n), w(j+1,8n), w(j+2,8n), or w(j+3,8n), respectively, and extendingvertically. The inner-most loop iterations are the inner productcomputations.

[0086] The third level loop is used to generate the U phase outputs ofthe filter; that is, step through the sub-filters H0, H1, . . . , H(U−1)and for each sub-filter the corresponding vector of samples. Again withthe 5/8 19-tap example, the third level loop successive computes theinner products of sample vectors with H0, H1, H2, H3, and H4. As notedin the previous paragraph, for the H0 filterings the 4 MAC units usesuccessive 4-vectors [w(j,8n), w(j,8n+1), w(j,8n+2), w(j,8n+3)],[w(j+1,8n), w(j+1,8n+1), w(j+1,8n+2), w(j+1,8n+3)], [w(j+2,8n),w(j+2,8n+1), w(j+2,8n+2), w(j+2,8n+3)], [w(j+3,8n), w(j+3,8n+1),w(j+3,8n+2), w(j+3,8n+3)], respectively. Then the second iterationcomputes the inner products of H1 with [w(j,8n+1), w(j,8n+2), w(j,8n+3),w(j,8n+4)], [w(j+1,8n+1), w(j+1,8n+2), w(j+1,8n+3), w(j+1,8n+4)], and soforth. Note that the address generator for each MAC unit may incrementby 5N/8 for each third loop iteration, the same as the fourth loopincrements, and the offset of addresses between MAC units is just 1(adjacent columns).

[0087] Thus the output of the inner-most and next level loops is a Utall by 4 wide resized-both-vertically-and-horizontally image. Thesecond and first loop levels are used for repeat the processinghorizontally to a desired larger output width and then repeatingvertically for the height of the output array, which should be amultiple of U to be efficient.

[0088] Compared with the first pass horizontal resampling, the verticalresapling second pass loses one degree of freedom in data steppingthrough the data access block. In particular, the horizontal pass hasthe inherent parallelism of the 4 MAC units to yield 4 outputs, and thethird level loop to step the 4-output processing through groups of 4outputs for the U phases of the filter. The third level loop provides anopportunity to have an offset between-groups to adjust the slope ofprocessing; see FIG. 7b which shows an offset of 5 in the fifth row(offsets of 6 or 7 could also have been used with different 0coefficient padding for H3).

[0089] In contrast, the second pass vertical resampling uses theparallelism of the 4 MAC units to process 4 image columns independently.In terms of the data access pattern, only one output is generated by theinner-most (fourth level) loop. The two outer-most level loops providewidth and height flexibility of the output array. Thus, there is onlythe third level loop to go down the data access pattern, and thereforeany offset can be programmed between rows. For upsampling, this fixedoffset per row provides less slope matching accuracy than the horizontalfirst pass. On the other hand, the third level loop can go for Uiterations to compute exactly U rows in the data access block, comparedto the 4*ceiling[U/4] rows that the horizontal pass is executing, andtherefore a little bit of efficiency is regained.

[0090] The addressing freedom in vertical resampling works better fordownsampling. For fractional upsampling, we have to pick between 1:1slope (offset-1) or infinited slope (offset-0).

[0091] Note that the addressing freedom difference in the horizontal andvertical resampling is very specific to the processor architecture. The4 levels of looping and the desire to have width-height flexibilityleaves only one level for vertical pass to go through the U outputs. Ifwe have more loop levels or can sacrifice either width or height looping(first or second level), we can use one more level and provide betterslope control. If U or D is fixed at some convenient number, such as 8,16, or 32, for data storage, we con do without either the output widthloop or the output height loop, and give one more loop tointra-data-access-block control.

[0092] Similar to horizontal resampling, we look at the data accesspattern, consider the addressing freedom we have, and devise an accesscoverage chart to implement the resampling. Without the 4-outputgrouping, we never have to produce redundant outputs. However, thereduced addressing freedom means sometimes we may have moremultiplying-by-zero kind of wate. We have an overall efficiency of19/25-76% with the FIG. 8c access coverge chart for the 54/8 resamplingin the vertical pass.

[0093] 5. Multiple resampling ratios

[0094] Following section 6 describes preferred embodiment genericresampling methods, a resampling method that determines how to implementU/D resampling given U and D, without any pre-computed information.However, frequently a resampling application has constraints on the setof resampling factors or the set is given. This section considers anexample of a given set of resampling factors in order to provide anapproach to generic section 6.

[0095] Consider the example of the set of resampling factors 4/3, 5/3,2, 7/3, 8/3, 3, and 10/3. These are 1/3 steps that, together with a3×optical zoom, provide 4×, 5×, . . . , 10× zoom capability for adigital camera. That is, U/D resampling with D=3 (or 1) and U in the set{2,3,4,5,7,8,10}. The following methodology also applies to other setsof resampling factors.

[0096] The example presumes use of the 4U-long filter kernel obtained byapplying a 4U-wide triangular window on a sinc function. FIG. 9illustrates the sinc function plus a triangular window function and theproduct filter kernel. The length of the filter is a tradeoff betweencomputation complexity and signal (image) quality. A length of 4U isused in the example. Due to the window vanishing at the endpoints, sothe first and last samples of the digitized filter kernel are 0. Thatis, for resampling factor of U/D, the digital filter kernel will be a(4U-1)-tap filter.

[0097] First consider the 4/3 resampling in detail. The filter length is15 taps; but for convenience, index the filter coefficients from 0 to 16where the 0^(th) and 16^(th) are both 0. For 4/3 resampling, there areU=4 phases (sub-filters) as shown in FIG. 10. Note that the input indexis offset by 1 so that the center maximum of the sub-filter H0multiplies x(3j) as part of the inner-product computation for y(4j). Theinner products, denoted <|>, for one set of 4 outputs are:

y(4j)=<H0|[x(3j−1), x(3j), x(3j+1)]>

y(4j+1)=<H1|[x(3j−1), x(3j), x(3j+1), x(3j+2)]>

y(4j+2)=<H2|[x(3j), x(3j+1), x(3j+2)]>

y(4j+3)=<H3|[x(3j+1), x(3j+2), x(3j+3)]>

[0098] In general, y(Ui+k)=

<Hk′|[x(Dj+ceiling{(kD−2U+1)/U}), . . . , x(Dj+floor{(kD+2U−1)/U})]>

[0099] where k′=−kD mod[U]. FIG. 11 illustrates this general expression.

[0100] Explicitly, the phases of the filters and the data access pointsfor the set of resampling factors of the example are: Output phaseFilter phase First input Last input Zoom = 4/3 0 0 −1 1 1 1 −1 2 2 2 0 33 3 1 4 Zoom = 5/3 0 0 −1 1 1 2 −1 2 2 4 0 3 3 1 0 3 4 3 1 4 Zoom = 4/20 0 −1 1 1 2 −1 2 2 0 0 2 3 2 0 3 Zoom = 7/3 0 0 −1 1 1 4 −1 2 2 1 −1 23 5 0 3 4 2 0 3 5 6 1 4 6 3 1 4 Zoom = 8/3 0 0 −1 1 1 5 −1 2 2 2 −1 2 37 0 3 4 4 0 3 5 1 0 3 6 6 1 4 7 3 1 4 Zoom = 3/1 0 0 −1 1 1 2 −1 2 2 1−1 2 Zoom = 10/3 0 0 −1 1 1 7 −1 2 2 4 −1 2 3 1 −1 2 4 8 0 3 5 6 0 3 6 20 3 7 9 1 4 8 6 1 4 9 3 1 4

[0101] Of these resampling fators, the factor 2 (=6/3) is implemented as4/2 rather than just upsampling by 2 because the processor has 4 MACunits and this four-outputs in parallel is then more efficient. Incontrast, the resampling factor 3 is left as upsampling by 3.

[0102]FIG. 12 shows the data access blocks for this set of resamplingand kernels. These data access blocks were generated individually foreach resampling factor by using (i) the general expression for y(&i+k)to find the input range for ech output phase (where to put the dots),(ii) the best fit architecture kernel of the three available for eachresampling factor using the data step between output groups (kernelheight), generally the collectiveheight of the access points isceiling(U/kernel_height)*kernel_height, (iii) each access coverage chartalso provides the origin of the access points, defined by the data indexof the first access point; and (iv) and the resampling factor 2 wasrecast as 4/2 due to the 4 MAC unit structure. Thus pre-computedparameters to be used in a run-time digital zoom program (inputparameters the resampling ratio U/D) would be: architecture kernel,

[0103] Generalizing the foregoing example to other processors (differentset of architecture kernels) and/or other resampling ratios requiresother pre-computations. However, practical limitations on the ranges ofthe parameters should allow compact representation. In particular, thefollowing ranges:

[0104] U and D in the range 1 to 8.

[0105] Architecture kernel height in the range 1 to 15.

[0106] Number of horizontal and vertical filter taps in the range 4 toM, where M=max(U*multiply_factor, D*multiply_factor) and multiply_factoris an integer such as 2 or 4 to insure sufficient filter coefficients inthe case of small U.

[0107] Horizontal and vertical data starting point in the range −M to 0.

[0108] Horizontal and vertical output data step in the range 0 to M.

[0109] This means a description by two small numbers (multiply_factor,architecture kernel) of about 2-4 bits plus six numbers (two filtertaps, two data starting points, and two output data steps) of byte sizeto specify each resampling ratio setup for resampling a two-dimensionalimage. Thus four 16-bit words should hold a resampling setup parameters.For example, FIG. 13 shows the parameters for the setups of FIG. 12.

[0110] To construct the filtering from the parameters, proceed asfollows:

[0111] (a) compute the upsampling filter coefficients as samples of4*max(U,D) long windowed sinc (=sinx/x) function.

[0112] (b) compute phase of sub-filter required for each of U outputs ina data access block.

[0113] (c) compute the starting and ending data points needed for eachoutput.

[0114] (d) compute the starting data point accessed for each output(horizontally and vertically) by the architecture kernel, data step pergroup, and starting point.

[0115] (e) the difference between the first data point needed and thefirst data point accessed tells us how many leading zeros shoud bepacked into the sub-fitler coefficient array.

[0116] (f) fill the sub-filter coefficients with the upsampling filterkernel samples with the phasing and zero-padding form step (e).

[0117] 6. Multiple Resampling Ratios Generation at Run-Time

[0118] The preceding section 5 describes a manual process of looking upa set of resampling factors, capturing essential parameters, and using arun-time program to reconstruct the previously-determined filteringscheme based on the parameters. This is just one of four alternativeapproaches, from highly-precomputed to run-time determined, that areavailable for a digital zoom of the type described in section 5; thesealternatives use pre-computed information together with a storedrun-time program:

[0119] A. Pre-compute all the processor (e.g., 4 MAC units) commands,with filter coefficients pre-arranged and zero-padded according toaccess coverage charts. For seven resampling factors with sub-filters asin section 5, this roughly will take 7*2*20=280 16-bit words of commandsand 2*(4+5+4+7+3+10)*10=820 words of filter coefficients, for a total of1,100 words of pre-computed information for the 4-MAC-unit processor.

[0120] B. Pre-computed parameters sufficient to generate the commands (4words per zoom factor as in FIG. 13); also include all filtercoefficient values, pre-arranged and zero-padded according to coveragecharts, roughly 820 words for the sub-filters of section 5, which totalsroughly 850 words.

[0121] C. Pre-compute parameters sufficient to generate the processorcommands, but the program generates the filter coefficients pluscommands. This takes about 7*4=28 words for the set of seven resamplingfactors; sinc and windowing functions for filter coefficients arecomputed online, and the sinc function, in particular, needs a sinefunction that can take up some code space.

[0122] D. Use a processor program to plan resampling on-the-fly,construct filter coefficients and MAC unit commands without anypre-computed information.

[0123] The level of difficulty and code size in a digital zoom programincreases with a decrease in pre-computed information. The proceduredescribed in foregoing section 5 follows alternative C. Alternative Dcan be realized by following the reasoning of section 3, and the programcan be simplified by relaxing optimality in searches.

[0124] With a small set of zoom factors, alternative A probably achievesthe smallest overall program/data size. A modest-sized set of zoomfactors suggests alternative C; and a large set of possible zoom factorsdemands alternative D.

[0125] Note that the program of alternative D could be run off-line togenerate parameters. Then capture the parameters for a simpler run-timeprogram (e.g., alternative C) to expand the parameters into a fullimplementation. Similarly, if there are so few resampling factors thatalternative A produces smaller program plus data than alternative C,then the MAC commands can be captured by running alternative C offlineand use alternative A for run-time.

[0126]FIG. 1 shows the steps of a program of alternative D with inputs Uand D. The steps execute for both a horizontal pass and for a verticalpass of a two-dimensional zoom as with the example of section 5; thesteps will be explained in detail after the following listing.

[0127] (a) compute the coefficients of the phase 0 to phase U−1sub-filters from the 4*max(U,D) samples of the windowed sinc function;this also provides relative first and last data points needed for eachsub-filter;

[0128] (b) pick a multiply factor if the value of U is small;

[0129] (c) for each architecture kernel available, make a first estimateof a data step per (output) group by the integer closest to H*D/U, andconsider five estimates for the data step per group as: the firstestimate, the first estimate ±1, and the first estimate ±2;

[0130] (d) for each combination of architecture kernel plus data stepper group estimate, compute the best starting data point and bestsub-filter length (number of taps);

[0131] (e) for each combination with starting data point and sub-filterlength from (d), register the computation cost (basically, thesub-filter length); (f) pick the combination with the minimalcomputation cost (fewest sub-filter taps) for the resamplingimplementation.

[0132] In more detail, there are eight parameters for each resamplingfactor, U/D, and an exhaustive search through all the combinations takeson the order of M³ trials where M is max(U,D). This would be tolerablefor an offline program, but not for run-time. Thus reduce the searchspace at the expense of resampling efficiency, but if we set all theparameters to work with any case, we lose efficiency in the resampling.For example, we know that all required data accesses fit inside the boxin the data access block, taking U*(D+ceiling(K/U))multiply-accumulates, while only K multiply-accumulates are required.However, for the example of section 5, there are only three kernels forthe horizontal pass and one kernel for the vertical pass, so iterationthrough all choices is fine. The multiplying factor (multiply_factor inFIG. 13) is needed to get sufficient numbers of outputs to make use ofthe 4-way parallelism of the 4-MAC processor, so a simple rule is used:when U=1, take multiply_factor=4, when U=2 take multiply_factor=2,otherwise, multiply_factor=1 (no change).

[0133] The architecture kernel decides the fine structure of the MACedge, and the data step per group decides the big picture. (For a smallU that needs only one group in the horizontal pass because U is notgreater than H, the kernel height, the architecture kernel alone setsthe edge.) The data step per group thus should have the edge match U/D.Thus guess at the optimal data step per group value as the closestinteger to H*D/U and then consider the −2, −1, 0, +1, +2 increments ofthis first guess data step per group value to capture the best value.Thus for the example of section 5 there are 15 combination ofarchitecture kernel and data step per group for the horizontal pass (3kernels and 5 data step per group values), and 5 combinations for thevertical pass.

[0134] For each combination of architecture kernel and data step pergroup, the optimal starting data access point can be computed asfollows. Let mac_edge[i] be the first_data point accessed for phase ioutput, relative to the first_data point accessed for phase 0 output;this just indicates the shape of the kernel. For example, presume the“2:1 slope” kernel with a data_step_per_group of 2 (such as for the 7/3resampling of FIG. 12), then mac_edge[1]=0 because the phase 1sub-filter output is aligned with the phase 0 sub-filter output in the“2:1 slope” kernel, mac_edge[2]=mac_edge[3]=1 again from the “2:1slope”, mac_edge[4] mac_edge[5]=2 from the data_step_per_group=2, andmac_edge[6]=3 (for the 7/3 example, the phase 7 output is not used, soignore mac_edge[7]). Then define the data point to start the kernel at,using mac_edge[0]=0:

data_start_best=min_(i) (first_data required[i]−mac_edge[i])

[0135] where first_data_required[i] is the first_data point used in thephase i output sub-filter. Again with the 7/3 example, ifn=first_data_required[O], thenfirst_data_required[1]=first_data_required[2]=n,first_data_required[3]=first_data_required[4]=n+1, andfirst_data_required[5]=first_data_required[6]=n+2. Thus data_start_best=n-1; that is, one data point before the first_data point needed by thephase 0 output; the empty circle in the first row of the 7/3 examplereflects this.

[0136] Then the number of taps needed (with all sub-filters padded tothe same number of taps) is

num_taps_best=min_(i)(last_data_required[i]−mac_edge[i]−data_start_best)

[0137] where last_data_required[i] is the last_data point needed for thephase i output. So once more with the 7/3 example and the first_dataneeded again called n, last_data_required[0]=n+2,last_data_required[1]=last_data_required[2]=n+3,last_data_required[3]=last_data_required[4]=n+4, andlast_data_required[5]=last_data_required[6]=n+5. Thus num_taps_best=5 asshown in the 7/3 example by the rows being 5 circles long.

[0138] Thus the computation cost of the combination, “2:1 slope” anddata_-step_per_group=2, registers as 5-tap sub-filters. FIG. 13 showsthe corresponding parameter values for the FIG. 12 example; note thatthe phase 0 sub-filter has a center peak which is aligned to input datapoint 0, which leads to the starting access data point being −2 aslisted in FIG. 13 for the 7/3 horizontal.

[0139] Also for comparison in the 7/3 example, the computational cost ofthe combination of “2:1 slope” with data_step_-per_group=3 would be asfollows. First, mac_edge[1]=0, mac_edge[2]=mac_edge[3]=1,mac_edge[4]=mac_edge[5]=3 from the data_step_per_group=3, andmac_edge[6]=4. Next, first_data_-required[i] remained unchanged, sodata_start_best changes from n−1 (where n denotes the first_data neededby the phase 0) to n−2 because of the increase in either mac_edge[4] ormac_edge[6]. That is, the increase in data_step_per_group causes thestart to be two data points before the first need data point of thephase 0; and this, in turn, leads to an increase in num_taps_best from 5to 6. Thus the computation cost is higher for the combination “2:1 slopeand data_-step_per_group=3, and combination is rejected. Similarly forthe other combinations, so the 7/3 combination selected for thehorizontal pass is the one shown in FIG. 12.

What is claimed is:
 1. A method of resampling a data sequence,comprising: (a) providing filter coefficients according to an inputresampling ratio U/D where U and D are positive integers, saidcoefficients grouped into U sub-filters according to phase andcorresponding to a data access block; (b) for each of a plurality ofarchitecture kernels: (i) provide a step per group of H of saidsub-filters from a first set of integers about H*D/U where H is theheight of said architecture kernel; (ii) for each of said steps fromsaid first set, find a length for said sub-filters according to anaccess coverage chart for said data access block; (c) using thearchitecture kernel and the step corresponding to a minimum of saidlengths of step (b)(ii) to filter an input data sequence.
 2. The methodof claim 1, wherein: (a) said filter coefficients of step (a) of claim 1are samples of a windowed sinc function.
 3. The method of claim 1,wherein: (a) said input data sequence is an image; and (b) said filterof step (c) of claim 1 is a horizontal resampling.
 4. A digital camerazoom, comprising: (a) an input for zoom selection; and (b) parallelprocessing circuitry coupled to said zoom selection input and operableto resample an image by (1) providing filter coefficients according to aresampling ratio dependent upon an input zoom selection, saidcoefficients grouped into sub-filters according to filter phase andcorresponding to a data access block; (2) for each of a plurality ofarchitecture kernels of said parallel processing circuitry (i) provide astep per group of said sub-filters from a first set of integerscorresponding to the height of said architecture kernel and saidresampling ratio, (ii) for each of said steps from said first set, finda length for said sub-filters according to an access coverage chart forsaid data access block; and (3) using the architecture kernel and thestep corresponding to a minimum of said lengths of step (b)(ii) tofilter said image.